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Abstract 

The general Markov model of the evolution of biological sequences along a tree leads 
to a parameterization of an algebraic variety. Understanding this variety and the 
polynomials, called phylogenetic invariants, which vanish on it, is a problem within 
the broader area of Algebraic Statistics. For an arbitrary trivalent tree, we determine 
the full ideal of invariants for the 2-state model, establishing a conjecture of Pachter- 
Sturmfels. For the K-state model, we reduce the problem of determining a defining 
set of polynomials to that of determining a defining set for a 3-leaf tree. Along the 
way, we prove several new cases of a conjecture of Garcia-Stillman-Sturmfels on 
certain statistical models on star trees, and reduce their conjecture to a family of 
subcases. 
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1 Introduction 

An important problem arising in modern biology is that of sequence-based 
phylogenetic inference. Suppose we obtain a collection of biological sequences, 
such as genomic DNA, from currently extant species, or taxa. Assuming these 
sequences evolved from a common ancestral sequence, how can we infer a tree 
that describes their evolutionary descent? The use of algebraic methods for 
this problem was first proposed in 1987 in independent works by Lake [14], 
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and Cavender and Felsenstein [5]. Recently, Garcia, Stillman, and Sturmfels 
[10] initiated a more general algebraic study of statistical models, of which 
phylogenetic models are a particularly interesting example. In this new field, 
Algebraic Statistics, the viewpoints of algebraic geometry are central to inves- 
tigations of probabilistic models arising in applied contexts. 

In model-based phylogenetics, evolution is usually assumed to proceed along 
a binary tree from an ancestral sequence at the root of the tree, to sequences 
found in the taxa, which label the leaves of the tree. The n = 4 bases A, C, G, T 
of which DNA is composed are viewed as states of random variables. Each site 
in the sequence might be assumed to evolve i.i.d., so that different sites can 
be viewed as trials of the same process. Probabilities of the various base sub- 
stitutions along an edge of the tree can then be given by a Markov transition 
matrix along that edge. Additional biologically reasonable, or mathematically 
convenient, assumptions as to the form of these transition matrices are often 
imposed. The basic problem is to assume some model along these lines and 
use it to infer, from observations of DNA sequences only at the leaves, a tree 
topology that might describe their evolutionary descent. An excellent overview 
of the field of phylogenetics is provided by the recent volume of Felsenstein 
[9]. 

In the phylogenetics literature, a phylogenetic invariant for a particular model 
and tree is a polynomial that vanishes on all joint distributions of bases at 
the leaves that arise from the model, regardless of the values of the model 
parameters. In the terminology of algebraic geometry, the model and tree 
imply a parameterization of a dense subset of a variety, and phylogenetic 
invariants are the elements of the prime ideal defining that variety. 

For applications, one might hope that the near-vanishing of phylogenetic in- 
variants on observed frequencies of bases in DNA data could be used as a 
test of model-fit and/or tree topology. Although this idea remains undevel- 
oped for practical use, phylogenetic invariants have already provided means 
for addressing more theoretical questions in phylogenetics, such as the nature 
of maximum likelihood points [6], and the identifiability of certain models [4]. 

In this paper we investigate the phylogenetic variety for the general Markov 
model of base substitution for an arbitrary tree, a detailed specification of 
which will be given in the next section. This model was also the focus of the 
related investigations [1,2]. 

One main result is the proof of Conjecture 13 of Pachter and Sturmfels [16] on 
the ideal of phylogenetic invariants for the general Markov model in the case 
of k = 2 states: the invariants arising from all 3 x 3 minors of '2-dimensional 
flattenings' of an array along the edges of a binary n-taxon tree T generate 
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the full ideal. This is Theorem 4, which is stated more fully in Section 4 and 
proved in Section 8. 

a 3 




Fig. 1. A 5-taxon tree 

For an explicit example of this theorem, consider the 5-taxon tree of Figure 
1. Then for the 2-state model, denote the states by and l.A2x2x2x 
2x2 tensor P encodes the probabilities of various states at the leaves, where 
P(ii, H, U, = Phi2i3uis 1S the joint probability of observing state ij in the 
sequence at leaf aj, j = 1, . . . , 5. Now P has two natural flattenings according 
to the partitions of leaves produced by deleting an internal edge of the tree. 
The partitions, or splits, are {{ai, a 2 }, {a 3 , a 4 , a 5 }}, and {{ai, a 2 , a 3 }, {a 4 , a 5 }}, 
and the corresponding flattenings are 
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The theorem states that the 3x3 minors of these two matrices generate 
the prime ideal of all phylogenetic invariants for the 2-state general Markov 
model on this tree. In particular, this ideal has a natural set of generators that 
correspond to the splits, and therefore to specific topological features of the 
tree. 

We note that this theorem provides the first determination of all phyloge- 
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netic invariants for an arbitrary binary tree for any non-group-based model. 
Sturmfels and Sullivant [19] solved the similar problem for group-based mod- 
els, using the Hadamard conjugation ([12,13,8,20]) to recognize the varieties 
as toric. While algebraic models intermediate to the group-based and gen- 
eral ones have been introduced recently [3,7], our knowledge of them is less 
complete. 

We also investigate the question of the explicit determination of the phyloge- 
netic variety and ideal for larger k. We show in Theorem 17 that if we have a 
set of polynomials whose zero set is the variety for the 3-taxon tree, then we 
can construct a set of polynomials whose zero set is the variety for any binary 
n-taxon tree. Similar to the conjecture of [16], our constructions involve 'flat- 
tenings', though both 2- and 3-dimensional ones are now needed, as might be 
expected from [1]. Thus the only remaining obstruction to our determination 
of a defining set of polynomials for the phylogenetic variety for any binary 
tree and any number of states k is the determination of a defining set for the 
3-taxon tree variety. 

In Conjecture 5 we suggest that the same construction yielding set-theoretic 
defining polynomials for the variety would yield generators of the full prime 
ideal vanishing on the variety, provided we begin with generators of the ideal 
for the 3-taxon tree. This is the analog for arbitrary k of the Pachter-Sturmfels 
conjecture. 

Theorem 4, Theorem 17, and Conjecture 5, as well as the Sturmfels-Sullivant 
group-based result, can all be viewed as statements that the phylogenetic 
varieties and ideals arise from the 'local structure' of the tree. Exploiting this 
observation to provide better ways of characterizing the statistical support a 
data set might provide for specific local tree features would be interesting work 
for the future. In particular, invariants might provide a means of characterizing 
support for particular splits or tripartitions of the taxa. 

Despite our primary focus on phylogenetic models, to prove Theorem 17 we 
must consider certain other statistical models on star trees. In Section 6, we 
therefore investigate models with a /t-state hidden variable associated to the 
internal node, and / r state observed variables associated to the n leaves. Such 
models are of course interesting in applications outside of phylogenetics, as 
they are examples of rather common 'mixture models' in statistics. Following 
[10], they are termed hidden naive Bayes models. 

Our work here focuses on such models in the case that for each % the number of 
states U is at least as large as the number of hidden states k. Theorems 10 and 
11 describe how set-theoretic and ideal-theoretic defining sets of the associated 
varieties can be deduced from set-theoretic and ideal-theoretic defining sets of 
the variety of the related model which has K-state variables on each leaf. 
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As a consequence of this work on star tree models, in Corollary 14 we prove 
several cases of Conjecture 21 of [10], on ideal generators for the hidden naive 
Bayes model with k — 2. While one of these cases, for the 3-leaf tree, has 
been recently proved in [15], even for that case our argument is different, 
and perhaps more direct. Moreover, our work indicates that establishing the 
special cases mentioned in Conjecture 16 of this paper is sufficient to prove 
the full conjecture of [10]. 

Before obtaining these results, we begin with several background sections. In 
Section 2, we define the phylogenetic variety for the general Markov model 
through the natural parameterization arising from modeling molecular evolu- 
tion along a tree T by associating Markov matrices to each edge. In Section 
3 we then give a more convenient parameterization of (a dense subset of) the 
cone over the phylogenetic variety, which associates an arbitrary (tx/s matrix 
to each edge of T, rather than a Markov matrix. Section 4 introduces flat- 
tenings of tensors along edges and vertices of trees, while Section 5 develops 
the relationship of a form of multiplication of tensors to the varieties under 
investigation. Subsequent sections contain our primary results. 

Finally, we note that most of the results on phylogenetic trees in this paper 
hold not only for binary trees, but also under the weaker assumption that 
each vertex have valency at least three. An important exception is Theorem 
4, where the binary assumption is critical to our proof. 



2 Affine and Projective Phylogenetic Varieties 

Let T denote an n-taxon tree, by which we mean a tree with all internal 
vertices unlabeled and of valency at least 3, with n leaves labeled by taxa 
ai,...,a n . We will sometimes specify in addition that T is binary (i.e., all 
internal vertices are trivalent), as this assumption is needed for some of our 
results, and is often the case of primary interest in phylogenetics. 

Choosing as a root any vertex r of T, either internal or a leaf, denote the rooted 
tree by T r . Parameters for the K-state general Markov model of sequence 
evolution on T r consist of a root distribution vector 7r r = (tti, 7r 2 , . . . , it k ) with 
no n- negative entries summing to 1, together with Markov matrix M e , 

which has non- negative entries with each row summing to 1, for each of the 
2n — 3 edges e of T r directed away from r. 

This models the evolution of biological sequences as follows. The k states 
[k] = {1, 2, . . . , k} correspond to the alphabet from which sequences are com- 
posed. The root r represents the most recent common ancestor of the currently 
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extant taxa, and other internal nodes of the tree represent most recent com- 
mon ancestors of those taxa separated from the root by that node. The root 
distribution vector encodes the frequencies 7Tj with which each state % occurs 
in an ancestral sequence at r. The (i, j)-entry of a Markov matrix along a par- 
ticular edge of T directed away from r is the conditional probability of state 
i changing to state j at any particular site in the sequence during evolution 
along that edge. Thus each site in a biological sequence is assumed to evolve 
independently, according to the same process (i.i.d.). Note the biological term 
'sequence' as used here implies no mathematical structure other than a cor- 
respondence between sites based on ancestry; except for matching sites by 
common ancestry, the ordering of the sites within the sequences is irrelevant. 

Suppose a rooted n-taxon tree T r has \E\ edges, so that for a binary tree \E\ — 
2n—3. For the general Markov model of evolution along T r the parameter space 
S can thus be identified with a subset of [0, 1]^, where N = (k—1) + \E\k(k—1). 

Furthermore, there is a polynomial map <j) r : S — > [0, L = K n , giving 
the joint distribution of states in sequences at the leaves resulting from any 
parameter choice. We view points in (j> r (S) or C L as k x • • • x n tensors, with the 
ith index referring to the state at leaf Oj. Indices thus typically range through 
[k], and a fixed ordering of the taxa is reflected in the ordering of indices 
of tensors. Assuming the model adequately reflects real molecular evolution, 
from biological sequence data we can estimate entries of (f> r (s), but usually 
have little or no direct information about the parameters s. 

The map <p r is explicitly given by (f> r (s) = P, where 



where the product is taken over all edges e of T r directed away from r, edge 
e has initial vertex s(e) and final vertex /(e) and associated Markov matrix 
M e , and the sum is taken over the set 

H = {(b v ) veV ert(T) I b v e [k] if v ^ a,j, b v = ij if v = CLj} C [k] 2 ™~ 2 . 

Thus H represents the set of all 'histories' consistent with the specified states 
at the leaves. 

The map r can also be defined inductively, using matrix algebra, by viewing 
the tree T r as built up from smaller trees by the addition of pairs of terminal 
edges, as we now explain. For this purpose, we first assume T is binary. 

A cherry of T is a pair of distinct leaves , a« 2 whose incident edges contain 
a common (internal) vertex of T. For n > 3, any binary n-taxon tree contains 




(1) 
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at least two cherries, and any rooted binary n-taxon tree contains at least one 
cherry in which neither taxon of the cherry is the root of the tree. 

For n > 3 let T r n = T r denote a rooted binary n-taxon tree labeled by taxa 
ai, . . . , a n . Choose a cherry of T r n which does not contain the root r. Let T^_ x 
denote the rooted binary (n — l)-taxon tree obtained by deleting the cherry 
and its two incident edges from and labeling as a new taxon, say b, the 
(formerly internal) common vertex of the incident edges. 

Applying this definition recursively, we obtain from T r a sequence of rooted 
trees T£, T^_ 1 , . . . , T 2 r , which of course may depend on some arbitrary choices 
of cherries. We assume such choices have been made and fixed. 

The map <p r described above can now be described inductively as follows: 

A rooted 2-taxon tree has only one edge e directed away from r, so with 
parameters 7r r and M e , 

r (7r r ; M e ) = diag(7r r )M e , 

where diag(v) denotes the square matrix with v on its main diagonal and 
zeros elsewhere. 

To define r for T^, direct edges e away from r and suppose parameters 

s = (7r r ; {M e }) 

for are given. Then one obtains parameters s for T^_ 1 by simply discarding 
from s the two Markov matrices associated to the edges of not appearing 
in T^^. Inductively, we may assume <p r : S — > [0, l] Rm , the map giving the 
joint distribution of states at leaves for T^ 1 _ 1 as a function of parameters on 
T r m _ x , has been given. For convenience, we also assume that taxa of are 
a±, ai-, ■ ■ ■ i a m and those of TJ n _ l are a±, a,2, ■ ■ ■ , a m -2, b, with the given order- 
ings, and that e\ and e<i are the edges of containing a m _i, a m respectively. 

Then <f) r (s) = P, where P is an m-dimensional tensor with 2-dimensional slices 
given by first letting P = <j> r (s), v = P(ii, . . . , i m -2, •) and setting 

P(h, . . . ,i m _ 2 , ., •) = Mj diag(v)M e2 . (2) 

One can check that this definition of r agrees with our earlier one, and so is 
independent of the choice of cherries defining the sequence T 2 r , T 3 r , . . . , T r n . 

This approach to an inductive definition of <p r can be extended to the case 
of non-binary trees as follows. For an arbitrary tree T r , let T r denote any 
binary tree which resolves T r , in the sense that T r can be obtained from 
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T r by contraction of some edges. Extend a choice of parameters s on T r to 
parameters s on T r by assigning the identity matrix to those edges of T r 
which are collapsed in T r . Then since <fi r (s) = (f> r (s), the inductive definition 
for binary trees can be applied for T r . 

Lemma 1 For any n-taxon tree T , the inductive definition of <fi> r based on Eq. 
(2) and outlined above agrees with the definition in Eq. (1). 

We also denote by (fi r the unique extension of this map to a polynomial map 
<fi r : — > C L . The affine phylogenetic variety V(T) for the general Markov 
model on T is defined as the closure in C L of the image of <fi r . (Note that 
this closure may be taken using either the Zariski topology or the standard 
topology on C L , as the two closures will agree for the image of a polynomial 
map.) As has been shown elsewhere [17,1], this definition is independent of 
the choice of the root r. V(T) is irreducible, as it is the zero set of a prime 
ideal, the kernel of the map between polynomial rings associated to (fi r . 

Now one readily sees the image of <fi r lies on the hyperplane defined by the 
trivial phylogenetic invariant J2ie[ K ] n 1 — 0- It is therefore natural to pass 

to the projective phylogenetic variety in P L_1 by taking a projective closure. 
We denote this by V(T) also, making clear by context whether the affine or 
projective version is meant. 

The phylogenetic ideals of all polynomials vanishing on the affine phylogenetic 
variety or vanishing on the projective phylogenetic variety are of course closely 
related. Generators of the homogeneous ideal of the projective variety, 
together with the trivial invariant, generate the ideal of the affine variety. 
Conversely, any homogeneous polynomial in the ideal of the affine variety is in 
the homogeneous ideal of the projective variety. Thus identifying phylogenetic 
invariants for the general Markov model means identifying those polynomials 
vanishing on the projective phylogenetic variety. 



3 Reparameterization 

For any projective variety V C P m , let CV C C m+1 denote the cone over V, 
that is, the union of the lines represented by points in V. Equivalently, CV is 
the affine variety defined by the same polynomials as V. 

A dense subset of the cone CV(T) admits a parameterization that will be 
more useful than the parameterization <fi> r above. This new parameterization 
simplifies many arguments, since it allows matrices with any row sums to 
be associated to edges, and no longer requires a root distribution, or even a 
specification of a root. 
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Definition 2 Consider an n-taxon tree T with \E\ edges. Let U = C K with 
K = \E\k 2 . Choose any vertex of T as a root, directing all edges of T r away 
from r. View u £ as a (2n — 3) -tuple of complex k x k matrices M e , one 
for each edge e of T r . 

Then, in the case that T is binary, let ip : U — > C L be given inductively 
as follows, using the sequence of trees T r = T*, T^_ ± , . . . , TJ chosen in the 
discussion leading to Lemma 1: 

If n — 2, ip{u) = ip(M e ) = M e , so ip is the identity map. 

Ifn>2, letip :U -> C K be the map associated to T*_-y. Then for u £ U , 
define u £ U by omitting from u the matrices associated to the edges ei,e 2 
of T^ not in T^_ 1 . Then ip(u) = P, where P is a n-dimensional tensor with 
2-dimensional slices given by first letting P = ip(u), v = P(ii, . . . , i n -2, •) and 
setting 

P(h, . . . ,i n _ 2 , - , •) = Ml diag(v)M e2 . 

For non-binary trees, modify this construction as indicated for Lemma 1. 

As in Lemma 1, one sees that this map is independent of the choice of cherries 
determining the sequence T^TJ, . . . ,T£. Although ip apparently depends on 
the choice of r, one can further check that if r is moved from one vertex of an 
edge e to the other vertex, we need only transpose the matrix M e associated 
to that edge and the map is unchanged. Thus the map is independent of 
the choice of r, though our conception of how components of C K are placed 
into matrices does depend on r. Indeed, all these observations follow from the 
observation that ip can also be defined by a formula like that in Eq. (1), but 
with the factor Tz r {b r ) omitted. 

Proposition 3 The closure of ip(U) in C L is the cone CV(T) over the phy- 
logenetic variety V(T). 

PROOF. To see <f> r (S) C i/j(U), suppose s = (n r ; {M e }) £ S. Let e be the 
one edge of T 2 r , and define M' eQ = diag(7r r )M eo . With u = (M' eQ , {M e } e ^ eo ), 
we find that (f) r (s) = ip(u). Thus V(T) C ip(U). Furthermore ip(U) is a cone, 
since if u — ({M e }) £ U and A £ C, by picking any particular edge e of T 
and defining vl £ U to be identical to u but with AM eo replacing M eo , then 
ip(u') = \ip(u). Thus CV{T) C J{U). 

We next show there is a non-empty open, and therefore dense, subset of U 
whose image under ip lies in the cone over (f> r (S), and hence in CV(T). This 
will imply ijj(U) C CV(T). 

For simplicity of exposition, assume T r is binary. 
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First, if n = 2, then (f> r (S) certainly contains those 2- dimensional arrays whose 
entries add to 1 and none of whose row sums are 0. Now the subset of U on 
which all row sums of M e (= u) are non-zero and the total sum of the entries 
of M e (— u) is non-zero is an open set. The points in the image under tp of this 
open set lie in the cone over (f> r (S). 

Proceeding inductively, let e±,e2 be the edges of T' m which are not in TJ n _ 1 , 
and e% the third edge meeting them. We may also suppose r does not lie at the 
common vertex of e±, e%. Now there is an open 0i C U such that for points 
u G 0i, M ei and M e2 have all row sums non-zero. Letting Di be the invertible 
diagonal matrix constructed from the row sums of M e . , we may write 

M ei = D i M' ei , i = l,2, 

where M' has rows summing to 1. Let M' e3 = M e3 D 1 D 2 . Then for any u G 0i, 
we define a new u' G 0i as 

v! = ({M e } et{eiie2 , e3} ,M' ei ,M' e2 ,M'J, 

so that ip{u') = ip{u). Note that uj : @\ — > 0i mapping u \— > u' is given by 
rational functions. 

Let ip : U — > C K '™ 1 and r : 5 — > C Km 1 be the parameterizations associated to 
T^_ 1 . Then by induction there is a non-empty open Ct/ such that the image 
of all points in under ^ lie in the cone over 4> r {S). Then = w" 1 (0 x C 2k ) 
is a non-empty open subset of C/, and the image of any point of under i[) 
lies in the cone over <f> r (S). 

If T r is not binary, slight modifications can be made to the above argument 
to obtain the result. □ 

While the definition of ip has introduced many unnecessary parameters, in the 
sense that the dimension of the image is much smaller than the dimension 
of the parameter space, it offers us the advantage of dropping inconvenient 
requirements — that row sums of vectors and matrices be 1 — that arose 
from the original probabilistic setting of the general Markov model. 



4 Flattenings and phylogenetic invariants 

To describe the set of phylogenetic invariants we are concerned with, we require 
the notion of flattening a tensor P G C K ™ according to an n-taxon tree T. 

Let e be an edge of T. Then e induces a split of the taxa according to the 
connected components of T \ {e}. By reordering the indices in P if necessary, 
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we may assume the split is {{a\, . . . , a k }, {a k +i, . . . , a n }}. The flattening of 
P on e is the K k x K n ~ k matrix F = Flat e (P) defined as follows: Fix any 
ordering of J\ = [/«] fe and J 2 = [ft] n_fc , and for u 6 J\, v G J 2 , let F(u,v) = 

P(Ui, ...,U k ,Vx,.. .,V n - k ). 

If the tensor P = <j) r (s) gives the joint distribution of states for some parameter 
choice for the general Markov model on T, then Flat e (P) can be thought of as a 
joint distribution for a related graphical model with less complicated structure: 
For a tree with at least 3 leaves, choose the root r to be at one vertex of the 
edge e, and imagine at r a K-state hidden variable. The possible joint states at 
the taxa aj, . . . , a k are viewed as a single K fc -state observed variable. Similarly, 
the joint states at the taxa a k +i, . . . , a n are described through a single K n ~ k - 
state variable. We thus have a "coarser" graphical model with one hidden 
fc-state internal node and two descendent nodes with n k and K n ~ k states, 
respectively, as depicted in Figure 2. The flattening of P simply prevents one 
from examining the finer structure in the joint distribution array that arises 
from the branching of T on either side of e. 




Fig. 2. Flattening on an edge e 

From this interpretation one readily sees that for any P 6 4>r{S), Flat e (P) has 
rank at most k. Indeed, for the coarser graphical model, the joint distribution 
matrix must have the form 

Flat e (P) = Ml diag(vr r )M 2 

where M\ and M 2 and k x K n Markov matrices. 

As a result, all x (k+1) minors of Flat e (P) must vanish. As is classically 

known, such minors generate the full prime ideal of polynomials vanishing on 
matrices of rank < k, and thus generate all invariants associated to the coarser 
model. For the original model on T, these minors therefore give phylogenetic 
invariants, which we call edge invariants associated to the edge e. 

We denote by J r e dge{T) the set of all + x (k+1) minors of all flattenings 
of a k x ■ ■ ■ x k tensor of n n indeterminates on edges of T. Of course the choice 
of ordering of rows and columns in the flattening introduces factors of ±1, but 
as our goal is to determine ideal generators, we may ignore this issue. 

In Section 8 we will establish the following, which was conjectured in [16]. 
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Theorem 4 For k = 2 and any number of taxa n, the phylogenetic ideal 
ax for the general Markov model on a binary n-taxon tree T is generated by 
Fedge{T), the 3x3 minor s of all edge flattenings of a 2 x • • • x 2 tensor of 
indeterminates . 

However, for larger k it is not enough to consider only 2- dimensional edge flat- 
tenings (i.e., flattenings to matrices) to obtain generators of the phylogenetic 
ideal. This can be seen already for the 3-taxon tree. In this case, J r e dge{T) 
is empty, but for any k > 2 the phylogenetic ideal contains polynomials of 
degree k + 1 (see [1]; for k = 3 see also [10]). Thus we need at least to consider 
flattenings of P at internal nodes of T producing 3-dimensional tensors. 

More specifically, consider a trivalent internal vertex v of a tree T, contained 
in edges ei,e 2 ,e 3 . Then v induces a tripartition of the taxa according to the 
connected components of T \ {v , t\, e 2 , 63}- By reordering the indices in P if 
necessary, we may assume the tripartition is 



Then a flattening of P at v is a n k x k, 1 x n n k 1 array F = Flat v (P) defined as 
follows: Fix an ordering of J\ = [n] k , J 2 = [k] 1 , and J3 = [n\ n ~ k ~ l , and for u G 
Ji, v e J 2 , w e J 3 , let F(u, v, w) = P(m, . . . , u k , fi, . . . , v h wt, . . . , w n - k -i). 



As illustrated in Figure 3, we think of this flattening as producing a joint 
distribution array associated to a graphical model with one hidden K-state 
internal node and three descendent nodes with n k , k 1 , and n n ~ k ~ l states, re- 
spectively. Similar to flattenings on edges, a flattening at an internal node 
ignores the finer structure in the joint distribution array that arises from the 
branching of T in the three directions leading away from v. 

An ideal is associated to such a graphical model (1 hidden K-state ancestral 
node, 3 descendent nodes), and so to the flattening at a vertex. While we will 
investigate such ideals further in Section 6, already we can formulate a natural 
extension of the conjecture of [16]. 

Conjecture 5 For any k and any number of taxa n, the phylogenetic ideal 
<\t for the general Markov model on a binary n-taxon tree T is the sum of the 



{{ ai 



, ■ ■ ■ , Ok}, {ojfc+i, • • • , Ok+i}, {ok+1+1, ■ ■ ■ , a n }} . 




Fig. 3. Flattening at a vertex v 
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ideals associated to the flattenings of P at vertices ofT. 

That this conjecture is identical to Theorem 4 when k = 2 follows from work 
of Landsberg and Manivel [15]. They show that in this special case the ideal 
associated to a vertex flattening is the sum of those associated to the edge 
flattenings on the three edges containing the vertex. (The Landsberg-Manivel 
result is a special case of a conjecture in [10]. We will give a new and simpler 
proof of this case, and several additional cases, as Corollary 14.) 

Of course the notion of flattening at a vertex can be extended in a straight- 
forward way for vertices of valence > 3, and the conjecture formulated for 
non-binary trees as well. The extended conjecture for non-binary trees re- 
mains open even for k = 2. 

Although we will primarily need to refer to the 2- and 3-dimensional flattenings 
of a tensor P on an edge or at a vertex of a tree T, the notion naturally extends 
to flattenings based on any partition of the set of labels (taxa) associated 
to the indices of P. For instance, an n-dimensional k x • • • x k tensor P 
with associated labels ai,...,a n can be flattened according to the partition 
{{ai}, . . . , {a n _ 2 }, {a n _i, a n }} to give an (n — l)-dimensional n x • • • x k x k 2 
tensor. We use such a flattening, where a n -i,a n are in a cherry, in Section 8. 
Flattenings according to arbitrary bipartitions also appear in Section 6. 



5 The algebra of tensors, trees, and parameters 

In this section we define binary operations on trees, model parameters on 
trees, and tensors. These operations, all denoted by the same symbol 
exhibit relationships that will make them useful in later sections. 

Tensors: If Q and R are m- and n-dimensional tensors of 'matching size 
in the last and first index respectively, then we define an I = (m + n — 2)- 
dimensional tensor Q * R by 

K 

(Q * R)(i 1: ...i l ) = YjQi^ ■ ■ ■ ,im-i,j)R(j,i m , ■ ■ -,U)- 

For m = n = 2, this is of course just matrix multiplication. 

More generally, if the pth index of Q and the qth index of R both run through 
[k] , we may define Q * P)9 R by a similar sum. However, to keep our notation 
less cumbersome, we will generally try to express products using the last and 
first indices. 
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Trees: Suppose T' is a tree with taxa a%, 0,%, ■ ■ ■ , a m , and T" is a tree with 
taxa bi, &2, ■ ■ ■ , b n . Then by T' * T" we mean the (m + n — 2)-taxon tree with 
taxa ai, . . . , a m _i, b 2 , ■ ■ ■ , b n obtained by first identifying the vertices a m and 
b\ , and then deleting this vertex, replacing the two edges it lies in with a single 
conjoined edge, as illustrated in Figure 4. 




T * T" T 



Fig. 4. The * operation on trees 

Parameters: Consider trees T", T", and T = T'*T" with m, n, and m + n — 2 
taxa. Then from Section 3 we have the parameterizations 

if/:U' 
ip : C/ 

of the cones over the associated phylogenetic varieties. 

To impose directions on the edges of the trees for notational purposes, root T' 
and T at a\, and T" at b\. Then for v! G U', u" G U", we define w' *u" E U by 
retaining for each edge of T except the conjoined one the matrix associated 
to the edge in either v! or u", and for the conjoined edge using the product of 
the matrices in v! and u" associated to its parts. 

One readily sees that these three definitions imply the following. 

Lemma 6 ^{u 1 * u") = ip'(u') * il)"(u"). 

Lemma 7 IfT = T'* T" , then CV{T) = CV(T') * CV(T"). 

PROOF. It is clear that 

U = U'*U" = {«' * u" | v! G U', u" G U"}. 
Thus by Lemma 6, 

CV{T) = J(U) = WW) * t/j"(U") = CV(T')*CV(T"). 

□ 
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This result will be strengthened in Corollary 21. 

In the special case when T" is a 2-taxon tree, T'*T" is isomorphic to T' . Then 
u" = ip(u") is simply a/tx/t matrix. Informally, one can think of ip' (u') (u") 
as the result of 'extending' the edge of T' terminating at a m and associating 
to the edge extension the matrix u" . 

Considering invertible matrices u", we get an action of GL(k, C) on both U' 
and ip'(U'). Thus GL(k,C) acts on the closure, CV(T'), as well. Viewing the 
action described here as operating in 'the last index' of a tensor in Vr>, we 
similarly have an action in the other indices. These actions of GL(k, C) are 
of course just restrictions of the natural actions of that group on the set of 
all k x • • • x k tensors: For j = 1, . . . , n, the 'jth index' action is defined by 
Ph P * jA A for A e GL(k, C). 



6 Models on star trees 

In this section, we step back from the phylogenetic tree setting, and consider in 
more depth the hidden naive Bayes models of [10]. Most of our results will be 
needed for application to phylogenetic varieties. However, we develop this ma- 
terial in slightly greater generality than we need for phylogenetic applications, 
and so obtain partial results on a conjecture of [10] as well. 

The graphical models of this section are based on a star tree, as in Figure 5, 
with one internal vertex r, connected by edges to n leaves a\,a2, ■ ■ ■ ,a n . A 
hidden random variable associated to r has k possible states, with probability 
distribution given by a vector 7r r . Each leaf ctj has associated to it a random 
variable with U states, and Markov matrices Mi of size k x Zj give conditional 
probabilities of observing the various states at <Zj given the state at r. 



a 7 




Fig. 5. Graphical depiction of a hidden naive Bayes model 

As in the phylogenetic situation, such a model defines a projective variety, the 
closure of the set of joint distributions of observations at the leaves arising 
from this parameterization. We denote this variety by V(k; l\, l 2 , . . . , l n ), and 
the homogeneous ideal defining it by a(«; h, fa, ■ ■ ■ , fa)- 
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As pointed out in [10], the variety V(k; h, I2, ■ ■ ■ , l n ) can be viewed more geo- 
metrically as the K-secant variety of the Segre product P Zl_1 x IP' 2-1 x • • • x 
p«n-i_ g ere '^-secant variety' means the closure of the union of the (k — 1)- 
dimensional afflne spaces spanned by collections of k points on the original 
variety, so, for instance, the 2-secant variety arises from points on secant lines. 

Note that V(k; AC. AC. AC ) = V(T 3 ), the phylogenetic variety for a K-state, 3-taxon 
tree. The varieties V(n; n k , k 1 , K n ~ k ~ l ), with k,l,n — k — I > 1, are the ones 
that arose in Section 4, in the discussion of flattenings of tensors at vertices 
of phylogenetic trees. Moreover, flattenings on edges involve V(k; K k , n n ~ k ), 
the variety of rank k matrices of size K k x K n ~ k , which is well understood 
classically. 

Our first goals are to show Theorems 10 and 11: Given a set T of polynomials 
set-theoretically (respectively, ideal-theoretically) defining V(k; k,k,...,k) for 
the n-leaf star tree, then for any I, > Kwe can explicitly construct polynomials 
set-theoretically (respectively, ideal-theoretically) defining V(k; h, l 2 , ■ ■ ■ , l n )- 

Previous to these theorems, we know of only one general result concerning 
defining polynomials of V(k; h, I2, ■ ■ ■ , l n )' When k — 2, for any number of 
leaves, [15] gives a natural set of polynomials defining the variety as a set. 

For our application to phylogenetic trees, the assumption that internal nodes 
are trivalent means only the case n = 3 is needed. We therefore summarize 
known results on V(k; k, k, k) = V(T 3 ) for small k. 

For k = 2, as noted in [10,1,15], V(T 3 ) = P 7 , and so {0} is the full prime ideal 
defining the variety. 

For k = 3, a generating set T for the prime ideal may be taken to be the 
27 quartic polynomials in [10], first found in [18] but also obtained from the 
construction in [1]. 

For k > 4, finding an explicit set T that even set-theoretically defines the 
variety is still an open problem. However, any polynomial vanishing on the 
variety must be of degree at least k+1. 

When k — 4, all degree 5 polynomials vanishing on the variety form an 
explicitly-known 1728-dimensional vector space. This dimension is computed 
in [11,15], and an explicit construction for general k is given in [1] that pro- 
duces a spanning set when k — 4. Moreover, off another explicitly-known vari- 
ety, the vanishing of these polynomials does distinguish points of V(T 3 ). How- 
ever, an explicit degree 9 polynomial is known which vanishes on V(4; 3, 3, 3) 
(see [10] for a statement, or [18] for the construction), and from this polyno- 
mial one can obtain degree 9 polynomials vanishing on V(4;4, 4,4) = V(T 3 ) 
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by evaluation on 3 x 3 x 3 subarrays of a 4 x 4 x 4 tensor. By consideration of 
the multidegrees of each monomial term in its many variables, one can show 
that these degree 9 polynomials cannot be generated by the degree 5 ones. 

We also note that if <fr is the parameterization arising from the general Markov 
model on T3, then for all k > 2 the image of <fi is strictly smaller than its closure 
V(Ts). This is pointed out in Section 9 of [1], but in the terminology of [18] 
is simply the statement that 'rank /t' is a strictly stronger statement than 
'border rank for k x k x k tensors. 

By modifying the approach of Section 3, it is possible to parameterize a dense 
subset of the cone CV(k; l±, l 2 , ■ ■ ■ , l n ) using parameters which are arbitrary 
matrices. We leave the details to the reader, but denote this parameterization 

by VWi,-,*n> where 

ipK-,h,...,i n '■ U K . ht „j n — > C L , U K . ht „j n = C k( ' 1+ +ln \ L = l x l 2 •••/„, 
and if P = ip(Mi, M 2 , . . . , M n ) then 

k n 

p(i u ...,i n ) = j2RM J (k,i J ). 

k=ij=i 

Here Mj e M(/c, lj, C), the set of complex k x lj matrices. 

In order to relate V(k; h, h, ■ ■ ■ , l n ) to V(k; k,k,...,k) we need the following 
lemma. It can be interpreted as describing the effect of extending one edge of 
the star tree, and associating a (non-square) matrix to that extension, as was 
explained at the end of Section 5. 

Lemma 8 Let P e CV(k;I u 1 2 , ■ ■ ■ ,L) and let A £ M(l n ,l' n ,C). Then A 
defines a map CV(k; h, I2, ■ ■ ■ , l n ) — > CV(k; h, l 2 , . . . ,l' n ) by P 1— > P * A. Fur- 
thermore, 

(i) If rank {A) = l' n , then CV(k; l±, l 2 , ...,/„)* A is dense in 
CV(K;h,l 2 ,...,l' n ). 

(ii) If k <l n then CV(k; l±,l 2 , . . . , l n ) * M(l n , l' n , C) is dense in 
CV(K;h,l 2 ,...,l' n ). 

PROOF. Suppose first that P = ^ K - h ,...,i n (M 1 , M 2 , . . . , M n ), with complex 
k x li matrix parameters Mj, % = 1,2, ... ,n associated to the n edges of T 
directed away from the internal node. Then 

P * A = ^....^^(Mi, M 2 , . . . , M n _i, M n A), 

hence P*A e CV(k; h, l 2 , . . . , l' n ). Since P*Ae CV(k; l u l 2 , . . . , l' n ) for P in a 
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dense subset of CV(k; h, l 2 , . . . , l n ), it follows that P*Ae CV(k; h, l 2 , . . . , l' n ) 
for all P e CV(K;h,l 2 ,...,l n ). 

Now suppose rank(A) = l' n . Then as M n ranges through all k x l n complex 
matrices, M n A ranges through all k x l' n complex matrices. Thus 

tpK-,h,...,i n (U K -i u ...,i n ) *A = ^ K -i u ...,i n _ u v n (U K .i u ^ v J, 

and so a subset of CV(k; h, ...,/„)* A is dense in CV(k; h, . . . , l' n ). 

Finally suppose k < l n . Then as M n ranges through all kx l n complex matrices 
and A through all l n x l' n matrices, M n A ranges through all k x l' n complex 
matrices. Thus 

i/j K ;h,...,i n (U K ;i 1: ...,i n ) * M(l n ,l' n ,C) = ^ K -i 1 ,...,i n _ 1 ,v n (U K . h: ... A ). 

Therefore a subset of CV(k; h, ■ ■ ■ , l n ) * M(l n , l' n , C) is dense in 
CV(K;h,...,Q. □ 

Remark 9 For non-zero P and A as in the proof, it is possible for P * A to 
be a zero tensor. Thus while the above lemma could be formulated in terms of 
a rational map between the underlying projective varieties, it is slightly easier 
for us to consider a polynomial map on the cones. 

By permuting indices Lemma 8 can be applied in any index, not just the last. 
As shorthand, we will refer to this as letting an 4 x l' k matrix 'act in the /cth 
index.' By considering only invertible 4 x matrices, we have a group action 
of GL(l k , C) in the /cth index, and so an action of GL(h, C) x • • • x GL(l n , C) on 
V(k; h, . . . , l n )- While this group action underlies the dimension computations 
of [15], our work will emphasize the utility of non-square and non-invertible 
matrices as well. 

Theorem 10 Consider an n-leaf star tree. Suppose l±, Z 2 , . . . , l n > k. Let T be 
any set of polynomials whose zero set is V(k; k,k,...,k). For k — 1, 2, . . . , n, 
let Zk = (zfj) be Ik x k matrices of indeterminates . For an h x l 2 x • • • x l n 
tensor P of indeterminates, let P be the kx kx ■ ■ ■ x k tensor that results from 
letting each Z k act formally in the kth index of P. Let T denote the set of 
polynomials in the entries of P obtained from those in T by substituting into 
them the entries of P, expressing the results as polynomials in the zfj, and then 
extracting the coefficients. Let J- e dge denote the set of x (k+1) minors of 

the n fiattenings of P on edges of the star tree. Finally, let l ± , l 2 , . . . , l n ) = 

J~ VJ J~ zdge ■ 

Then J-{k] l±,l 2 , . . . , l n ) defines V(k; l±,l 2 , . . . , l n ) set-theoretically. 
PROOF. We first observe that all polynomials in !F(k; l 1: l 2 , . . . , l n ) vanish 
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on the cone CV(k; l±, l 2 , . . . , l n ): Polynomials in T e dge must vanish there, since 
the model has re states at the internal node, so all 2-dimensional flattenings 
on edges must have rank < re on the parameterized subset of the variety, and 
hence on the whole variety. Polynomials in T must vanish there, since for all 
assignments of values to the z^, if P e CV(/c, h, . . . , l n ) then, by Lemma 8, 
P E CV(k; re,..., re). 

Now suppose all polynomials in J-{k\ l±, l 2 , ■ ■ ■ , l n ) vanish on a tensor P e 
C hh '" ln . Then, flattening P on the edge of the tree leading to a n gives a 
matrix of rank I < re, so we can write 

P = Q *£n 

where Q' is an ^ x l 2 x • • • x Z n _x x / tensor and P^ is an / x l n matrix. Construct 
aKx!„ matrix B n of rank /t by augmenting B' n with additional rows. Similarly 
augment Q' with additional zero entries to obtain an l\ x l 2 x • • • x Z n _i x k tensor 
Qo with P = Qo*B n . Now there exists an l n x k matrix A n so that B n A n = I, 
the identity matrix. Thus P * A n * B n = Q *B n *A n *B n = Q *I *B n = P . 

Proceeding similarly for the other taxa, we obtain matrices A k , B k such that 

(P * fc> i A k ) * fc>1 5 fc = P (^fe * #k) = Po- 

By simultaneously letting each A k act in the fcth index of Po, we obtain asx 
kx • • • x k tensor Pq. Because all polynomials in JF vanish on Po, all polynomials 
in T vanish on P . Thus by our choice of J 7 , Po G CV(k; k,k, . . . , re). Since, 
by repeated applications of Lemma 8, letting each B k act in the kth index 
maps CV(k; re, re, . . . , re) to CV(k; h, l 2 , . . . , /„), and maps Po to P , we see 
P eCV{K;h,l 2 ,...,l n ). □ 

We now state an ideal-theoretic version of this result. 

Theorem 11 Suppose l ± , l 2 , . . . , l n > re, and T is a set of polynomials gener- 
ating a(re; re, re, . . . , re). T/ien the set JF(re; /i, / 2 , . . . , /„) constructed from T as 
in Theorem 10 generates a(re; /i, / 2 , . . . , /„). 

Since the key argument in the proof of Theorem 11 will be used again in 
Section 8, we present it as a lemma. 

Lemma 12 LetV\ andV 2 be subvarieties ofC nmi andC nm ' 2 , respectively, with 
m i < m 2, su ch that, when points are written as n x m\ and n x m 2 matrices, 

V 1 = V 1 * M(mi,mi,C), 

and 

V 2 = Vi * M(mi,m 2 ,C). 
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Let dj denote the ideal of all polynomials vanishing on V*. 

Then a 2 is generated by the (m 1 + 1) x (mi + 1) minors of annx m 2 matrix P 
of indeterminates, together with all polynomials of the form f(P * A), where 
f G Oi and A G M(m 2 ,m 1 ,C). 

PROOF. Let b denote the ideal generated by the (m x + 1) x (mi + 1) minors, 
together with the polynomials f(P * A) described above. 

First we show a 2 5 b. It is enough to show the specified generators of b vanish 
on V\ * M(mi,m 2 , C). Since all points in this set are matrices of rank at most 
mi, the specified minors vanish there. To see the f(P*A) vanish there, consider 
a point Qo*B where Q G Vi, B G M(mi,m 2 , C). Then Q * B * A e Vi since 
B * A G M(mi,mi,C). Thus f(P * A) vanishes at Q * B. 

Our argument that a 2 C b is more involved. 

Note GL(m 2 , C) acts on Vi * M(m 1; m 2 , C), and hence on V 2 as well. Consider 
the degree m homogeneous component a 2 m ' ) of a 2 . Then the GL(m 2 , C)-action 
on V 2 gives a representation of GL(m 2 ,C) on a 2 m \ in which C G GL(m 2 ,C) 
maps the polynomial g(-P) 1— > g(-P * C). Since GL(m 2 ,C) is reductive, this 
representation decomposes into a sum of irreducible ones. Consider now one 
of the irreducible subspaces, W. It will be enough to show that W C b. 

Consider any non-zero polynomial g(P) G IF. Let Q denote a n x mi matrix 
of indeterminates. Then for any B G M(mi,m 2 ,C), the polynomial gB(Q) = 
g(Q * B) vanishes on Vi, since Q 1— > Q * B maps Vi to V 2 . Thus g B G di. 

Suppose first that for all B G M(m 1 ,m 2 ,C) the polynomial gs{Q) is identi- 
cally zero. Then g must vanish on all n x m 2 matrices of rank at most mi, since 
any such matrix can be written as Q * B for some complex matrices Q G 
M(n,mi,C), B G M(mi,m 2 ,C), and then g(Q *B) = g B (Qo) = 0. Thus if all 
gs are identically zero, then g is in the ideal generated by (mi + 1) x (mi + 1) 
minors of P, and hence g G b. 

Suppose, then, that for some B the polynomial gs is not identically zero. 
Let D G M(m 2 ,mi,C) be chosen so that h(P) = gs(P * D) is a non-zero 
polynomial. Such a D must exist since mi < m 2 . (For instance, D may be 
taken so that its first mi rows form an identity and the remaining rows are 
zero.) Then h(P) = g(P * DB), where DB is a complex m 2 x m 2 matrix that 
is generally not invertible. 

Nonetheless, the irreducibility of IF implies that h(P) G IF. This is simply 
because W is closed in a 2 m \ and so must contain the closure of the orbit of g 
under GL(m 2 , C), and this closure contains g(P * DB). 
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Now since g(P) G W, h(P) = </\b(-P * D) G W, and G Oi, the irreducibility 
of W implies g(P) is in the span of polynomials of the form f(P * A) where 
/ G Oi and A G M(m 2 , mi, C). Thus in this case as well, g G b. □ 

PROOF. [Proof of Theorem 11] Let a = a(/c; /i, . . . , /„), and let b be the ideal 
generated by J-{k;1\, . . . ,/„), the set defined in Theorem 10. Note that b is 

equivalently described as generated by J- e d ge U .F, where denotes the set of 
all polynomials of the form f(P) where / G T and P is obtained from a tensor 
P of indeterminates by the action of numerical matrices Z]~ G M(lk,K,C) in 
each index k. 

That a D b was shown in the proof of Theorem 10. To establish a C b. we 
proceed by induction on the number of indices k such that Ik > k, the base 
case of zero being trivial. 

If at least one such 1^ > ac exists, we may assume l n > ac. Then let V\ = 
CV(k; h, ■ ■ ■ , l n -i, K ) an d = CV(k; l\, h, ■ ■ ■ , l n )- We view points on V\ and 
V 2 as li • ■ ■ Z n _i x k and l ± ■ • ■ Z n _i x /„ matrices, respectively, by flattening on 
the edge of the star tree leading to the nth leaf. Using Lemma 8 we see that 
V\ * M(k, k, C) = V\ and, since l n > k, that V2 = V\ * M(k, l n ). Therefore we 
may apply Lemma 12, and obtain that a is generated by the (k + 1) x (k + 1) 
minors of the flattening of P on the edge to the nth leaf, together with all 
polynomials f(P * A) where / G a(/c; h, ■ ■ ■ , l n -i, k) and A G M(/„, k, C). We 
thus need only show such f(P * A) are in b. 

Now by induction, a(/c; h, . . . , l n -i, K ) is generated by (k + 1) x (k + 1) minors 
of edge flattenings of an h x • • • x Z n _i x k tensor Q of indeterminates, together 
with polynomials of the form h(Q), where h G ) and Q is a k x 

• • • x ac tensor obtained from Q by letting elements of M(Zj, ac, C) (respectively 
M(ac, ac, C)) act on Q in the ith index for each % 7^ n (respectively i = n). We 
may thus assume / itself has one of these forms. 

In the first case, where / G o(ac; li, . . . , Z n _i, ac) is a minor of an edge flattening 
for the model, we see / vanishes on all tensors Q that have rank at most ac 
when flattened on a certain edge e not leading to the nth leaf. But if P is an 
li x • • • x l n tensor with rank(F/at e (P)) < ac, then rank(F/at e (P * A)) < ac as 
well, for all A G M(/„, ac, C). Thus f(P * A) vanishes on all tensors such that 
rank(F/at e (P)) < ac, and so f(P*A) is in the ideal generated by (ac+1) x (ac+1) 
minors from edge flattenings of P. 

In the second case, where / = h(Q), we find f(P * A) = h(P) where P is 
obtained from P by letting elements of M(/j, ac, C) act on P in the ith index 
for each i, and h vanishes on V(n; ac, • • • , ac). 

Thus in either case f(P * A) G b. □ 
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Remark 13 It is natural to ask whether a smaller set of polynomials than the 
set described above — namely, those constructed by evaluation of polynomials 
in T on all k x • • • x k subarrays of a l\ x • • • x l n array of indeterminates 
- is sufficient to define the variety V(k; h, . . . , l n ). Indeed, Lemma 15 below 
shows it does in the special case k = 2, assuming elements of T have a special 
form. 

However, in general this subset does not even define the variety set-theoretically . 
To see this, consider the 3x3x4 tensor 

P = ei <g> e 1 <g> fx + e 2 <g> e 2 <E> fi + e 3 ® e 3 <g> / 3 + e 1 <g> e 2 <8> f\, 

where the are the standard basis vectors for C 3 and the fi the standard 
basis vectors for C 4 . That all 3 x 3 x 3 subarrays of P are in V(3; 3, 3, 3) is 
clear from the form of P. One can verify that P ^ V(3; 3, 3, 4) by checking the 
non-vanishing at P of some of the polynomials constructed in Theorem 11. 

As a corollary to Theorem 11, we prove several cases of Conjecture 21 in [10] 
on the ideals a(2; h, . . . , l n ). We note the n = 3 case was first proved in [15] 
by invoking sophisticated methods of Weyman [21]. 

Corollary 14 For n < 5, the ideal a(2; li, . . . , l n ) associated to the hidden 
naive Bayes model with a 2- state hidden variable and n observed variables 
with li,...,l n states, is generated by the 3x3 minors of all 2-dimensional 
flattenings associated to bipartitions of the observed variables. 

PROOF. Since there are no polynomials vanishing on V(2; 2, 2, 2) = P 7 , by 
Theorem 11 the set of polynomials vanishing on V(2; h, Z 2 , h) is generated by 
edge invariants. 

By calculations of [10], the statement holds for the two cases V(2; 2, 2, 2, 2) 
and V(2; 2, 2, 2, 2, 2). The corollary then follows from Lemma 15 below. □ 

Lemma 15 Suppose, for the n-leaf star tree, that the ideal o(2; 2, . . . , 2) is 
generated by the 3x3 minors of all 2-dimensional flattenings of 2 x • • • x 2 
tensors according to bipartitions of the observed variables. Then a(2; h, . . . , l n ) 
is generated by the 3x3 minors of all 2-dimensional flattenings of l± x • • • x l n 
tensors according to bipartitions of the observed variables. 

PROOF. By Theorem 11, o(2; h,. . . ,l n ) is generated by all 3 x 3 minors of 
edge flattenings of an l\ x • • • x l n tensor of indeterminates P, together with 
all 3 x 3 minors of all 2-dimensional flattenings of all P, where P denotes a 
2 x • • • x 2 tensor obtained from P by an action in each index % by matrices 
Ai E M(/j,2,C). One readily sees such flattenings of P can be expressed as 
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F = B\ * F * B2 , where F is the corresponding flattening of P and the Bj are 
matrices depending on the A^. But then the 3x3 minors of such a flattening of 
P will be zero provided F has rank < 2. Thus these polynomials are in the ideal 
b generated by 3 x 3 minors of flattenings of P. Therefore o(2; l±, . . . , l n ) C b. 

That a(2; /1, . . . , /„) D b is clear. □ 



A proof of the full Conjecture 21 of [10] will therefore follow from the following 
special cases: 

Conjecture 16 (Garcia, Stillman,Sturmf els) The ideal o(2; 2, 2, . . . , 2), that 
is, the ideal associated to the hidden naive Bayes model with a 2-state hidden 
variable andn 2-state observed variables, is generated by the 3x3 minors of all 
2-dimensional flattenings arising from bipartitions of the observed variables. 



7 Set-theoretic description of the phylogenetic variety: arbitrary 

K. 

For the remainder of this paper, we return to the consideration of models on 
phylogenetic trees. We first establish a set-theoretic result that provides some 
evidence for Conjecture 5, for arbitrary k. 

Theorem 17 For an n-taxon tree T, let T(T) be the union of all sets of poly- 
nomials J-{k] li, l 2 , ■ ■ ■ l n ), defined as in Theorem 10, associated to flattenings 
at nodes of T . Then the zero set of F{T) is the phylogenetic variety V(T). 

More informally, in conjunction with Theorem 10 this means that from polyno- 
mials whose zero set is V(k; k, . . . ,k) one can explicitly construct polynomials 
whose zero set is V(T) for any n-taxon tree T. 

In particular, knowledge of set-theoretic defining polynomials for V(T^) is 
sufficient to give set-theoretic defining polynomials for V(T) for any binary 
tree T. Thus while one might naively view the case of V(T 3 ) as the simplest, 
in fact it is the only remaining barrier to the determination of polynomials 
defining the binary n-taxon variety, for any n. In the cases k = 2, 3 where such 
defining polynomials are known, we thus obtain the following. 

Corollary 18 For k = 2 or 3, and any binary tree T, explicit polynomials 
set-theoretically defining V(T) can be given. 

Note that for k = 2 a stronger result is provided by Theorem 4, to be proved 
in Section 8. 
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For the remainder of this section let V F i a t{T) denote the zero set of J-(T). 
Our proof of Theorem 17 will follow several lemmas. The first is an analog for 
V F iat{T) of Lemma 7. 

Lemma 19 Let T' and T" be n-taxon and m-taxon trees, with T = T' * T" . 
IfQE CV Flat {T) and R G CV Flat (T"), then Q*Re CV Flat (T). 

PROOF. Consider any internal node v of T, which we may assume arises 
from an internal node of T' . We assume v is trivalent; straight-forward modi- 
fications to our argument give the general case. 

Flattening Q at v, the resulting tensor lies on CV(k; K ni , /t™ 2 , n n ' A ), with n 3 = 
n — ni — U2-, where we assume taxon a n of T' (where taxon b\ of T" is to be 
joined) is included in the last index of the flattening. Then the flattening of 
Q * R at v is obtained from the flattening of Q at v by an action in the third 
index by a matrix R' whose entries are determined by those of R. By Lemma 
8 the flattening of Q * R at v lies in CV(«; « ni , k" 2 , n n3+m - 2 ). 

ThusQ*ReCV Flat {T). □ 

We also need a converse to this lemma. 

Lemma 20 Let T' and T" be n-taxon and m-taxon trees, with T = T' * T" . 
Then if P E CV Flat (T), there exist Q £ CV Flat (T') and R e CV Flat (T") with 
P = Q*R. 

PROOF. 

Let e be the edge of T formed by conjoining edges of T 1 and T" . Since any 
P G CV F i a t(T) satisfies the edge invariants for e, we may flatten it on e to 
obtain a n n ~ l x n m ~ l matrix of rank I < k, and write 

P = Q*R, 

where Q and R are n- and m-dimensional tensors, respectively, with all indices 
running through [«]. We may further assume the non-zero Qk = Q{-, • • • ,-,k) 
are linearly independent, cis circ the non-zero R^ = R(k, •, • • • , •), and that 
Qk, Rk are non-zero only for k — 1, . . . , / < k. 

We next show Q G CV F i at (T'). First observe that since the non-zero R k are 
independent, if we write them as row vectors, there is a K m ~ l x k matrix A 
so that RkA = e k for all k < I. Now supposing the taxa of T' and T" are 
di, . . . , a n and 6i, . . . , b m , respectively, flatten P according to the partition 
{{ai}, . . . , {a„_i}, {b 2 , • • • , b m }} to an n-dimensional kx--xkx K m ~ l tensor 
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F. Letting R' denote the k x n m 1 flattened form of R with rows Rk, we have 
F = Q*R'. Thus F*A = Q*R'*A = Q. (Note that A does not act in a single 
index of P here, but does act in a single index of the flattening F.) It is now 
straightforward to see that any flattening of Q at an internal vertex of V is 
obtained from a flattening of P at a vertex of T, followed by an action in one 
of the resulting indices of a matrix determined by A. Thus by the definition 
of T{T), Q will satisfy all polynomials in JF(T'). 

Similarly, Re CV Flat {T"). □ 



PROOF. [Proof of Theorem 17] We already know that V Flat {T) D V(T). 

The proof that V F i at (T) = V{T) proceeds by induction on the number n of 
taxa. The cases of n = 2, 3 hold by the definition of JF(T). 

For simplicity, we first consider a binary tree T = T n ,with n > 4 taxa. Picking 
a cherry of T, let T n _i and T 3 be such that T = T n _i * T 3 . Suppose P e 
CVfjo^T 1 )- By Lemma 20, we have P = Q * -R, for Q e C , VF; at (T n _ 1 ) and 
i? G CVF; a t(T 3 ). This, in combination with Lemma 19, means the map 

fi : CVi,, at (T n _ 1 ) x CV Flat (T 3 ) ^ C^ at (T n ) 

defined by (Q, R) 1— > Q * R is surjective. 

Denote the parameterizations of the cones over the phylogenetic varieties for 
by ipk '■ Uk ^ C Lk . With the map ct : U n -\ X U 3 ^ U n defined by a(u n -i,us) = 
u n -i * U3, the diagram 



f/n CV Flat (T) 

commutes, by Lemma 6. 

Now a and \i are surjective, and by the inductive hypothesis the image of 
ip n -i x ip 3 is dense in CV F i a t(T n -i) x ^^(^3), so the image of ^ n is dense 
in CV Flat (T). Thus W/ at (T) = V(T). 

If T is not binary, the above argument may be modified by replacing the 
decomposition T = T n _ x * T 3 by T = T n _ fc+1 * T k+ \ where T fc+1 is a star tree 
with k + 1 leaves and T n _ fc+1 has n — /c + 1 leaves, if necessary. □ 



Theorem 17 and the preceding lemmas yield the following strengthening of 
Lemma 7. 
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Corollary 21 IfT — T'* T" , then CV(T) = CV(T) * CV(T"). 



8 The phylogenetic ideal: Binary T and k = 2. 

We now prove Theorem 4, and thus assume T is a binary tree and k = 2. 

Our arguments will use in several ways the fact (see Section 6) that for k = 2 
the variety V(T 3 ) fills its ambient space: V(T 3 ) = P 7 . Note, however, that 
for k > 2, V(T 3 ) C P K _1 , and so the approach here cannot be successfully 
modified in a simple way. 

The first use of this special fact is to note that for our chosen k, V(2; 2, 2, 2) = 
V(T 3 ) = P 7 means the set T defining V(T 3 ) is {0}. Thus the set F(T n ) of the 
set-theoretic result Theorem 17 is the set of edge invariants. While our goal 
is to show JF(T n ) generates the full ideal vanishing on V(T n ), we will not, in 
fact, appeal to Theorem 17 to do so. 

The second use of V(T 3 ) = P 7 is more subtle. Recall that regardless of k, 
there are actions of GL(k, C) on V(T n ) in each index. However, in the case 
k — 2, the special nature of V(T 3 ) gives us actions of GL(4, C) on V{T n ) via 
the cherries of T n . This is really the key point in our argument, as it underlies 
the application of Lemma 12. Nonetheless, this action is in some respect an 
'unnatural' consequence of k — 2. The following lemma provides a more careful 
statement of the special structure we use. 

Lemma 22 Let T n denote a binary n-taxon tree, labeled so that taxa a n _i 
and a n form a cherry. Write T n = T n _ x * T 3 , where a n _i,a„ are taxa on T 3 . 
Let e denote the edge of T n formed from conjoining edge e of T n _ x and the 
appropriate edge ofT 3 . View points in CV{T n ) and Cl / (T n _ 1 ) as 2 n ~ 2 x 4 and 
2«-2 x 2 matrices by flattening them on the edges e and e, respectively. Then 

CV(T n ) = CV(T n _ 1 )*M(2,4, C) 

and 

CV(T B _!) = Cy(T n _!) * M(2, 2, C). 



PROOF. The first claim is simply Lemma 7 applied to T n _i and T 3 , combined 
with the observation that CV{T 3 ) = C 8 flattens to give M(2, 4, C). (Note that 
by Corollary 21, we could also remove the closure symbol here.) 

For the second claim, apply the same argument to T n _ 1 and T 2 , observing that 
CV{T 2 ) =M(2,2,C). □ 
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PROOF. [Proof of Theorem 4] We proceed by induction on the number n of 
taxa for T n , with the cases of n = 2, 3 known. 

Let a = clt denote the ideal vanishing on CV(T), and b the ideal generated by 
Fedge{T). That a 3 b has been discussed already; we must show the opposite 
inclusion. 

With T n = T, choose a cherry so that T n = T n _i * T 3 , with notation as in 
Lemma 22. By that lemma, we may apply Lemma 12 with V\ = CV{T n _i) and 
V-i = CV(T n ). We thus find a is generated by the 3x3 minors of the edge flat- 
tening Flat e (P) on the conjoined edge e of an n-dimensional tensor of indeter- 
minates P, together with all polynomials of the form g(P) = f(Flat e (P) * B) 
where f(Q) vanishes on CV(T n _i), Q is a (n — l)-dimensional tensor of inde- 
terminates, and B G M(4,2,C). 

Now, by induction, the ideal of such / is generated by 3 x 3 minors of Flat e >{Q) 
as e' ranges through edges of T n _i. Consider one such minor, say / , obtained 
from the flattening on an edge e of T n _i. We may assume e ^ e, since 
otherwise there are no 3 x 3 minors. It will be enough to show f (Flat e (P) * 
B) E b. 

We claim that fo(Flat e (P) * B) vanishes on all P that have rank at most 2 
when flattened on the edge e in T n . For such a P, since Flat eo (P) is 2 m x 2 n ~ m , 
there is an expression P = Pi*P 2 , where Pi is an (m+l)-dimensional 2x • • -x2 
tensor, and Pi an (n — m + l)-dimensional 2 x ■ • • x 2 tensor. Then writing P 
and P 2 as 2 x • • • x 2 x 4 tensors by flattening to combine the taxa a n _i, a n , 
we have P* B = P\* (P2 * B). This shows P* B also has rank at most 2 when 
flattened on eo, and so f vanishes on it, as claimed. 

But since fo(Flat e (P) * B) vanishes on all P of rank at most 2 when flattened 
on eo, it is contained in the ideal generated by 3 x 3 minors of flattenings on 
e . Thus it is in b. □ 
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